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A Note on the Conversion of Item Parameters Standard Errors 


Abstract 


The relations among alternative parameterizations of the binary factor analysis (FA) model and 
two-parameter logistic (2PL) item response theory (IRT) model have been thoroughly discussed 
in literature (e.g., Lord & Novick, 1968; Takane & de Leeuw, 1987; McDonald, 1999; Wirth & 
Edwards, 2007; Kamata & Bauer, 2008). However, the conversion formulas widely available are 
mainly for transforming parameter estimates from one parameterization to another. There is a 
lack of discussion about the standard error (SE) conversion among different parameterizations, 
when SEs of IRT model parameters are often of immediate interest to practitioners. This paper 
provides general formulas for computing the SEs of transformed parameter values, when these 
parameters are transformed from FA to IRT models. These formulas are suitable for 
unidimensional 2PL, multidimensional 2PL, and bi-factor 2PL models. A simulation study is 
conducted to verify the formula by providing empirical evidence. A real data example is given in 


the end for an illustration. 
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Standard errors of item parameter estimates in item response theory (IRT), or more 
generally error covariance matrices, play an important role because the uncertainty in item 
parameter estimates is often carried over in subsequent analysis, such as test form assembly, IRT 
scoring, equating and linking (e.g., Cheng & Yuan, 2010; Mislevy, Wingersky, & Sheehan, 
1993; Thissen & Wainer, 1990). Moreover, obtaining the SEs is also a prerequisite for 
conducting hypothesis testing to evaluate differential item functioning (Cai, Yang, & Hansen, 
2011; Woods, Cai & Wang, 2013) or item parameter drift (Bock, Muraki, & Pfeiffenberger, 
1988); as well as developing asymptotic adjustments for limited-information goodness-of-fit 
Statistics (Cai, 2008; Cai, Maydeu-Olivares, Coffman, & Thissen, 2006). 

In IRT when full information maximum likelihood (FIML) estimation method is used, 
the parameter error covariance matrix can be computed as the inverse of the Fisher information 
matrix. Based on the statistical theory from the standard discrete multivariate analysis (Rao, 
1973), the error covariance matrix computed this way is considered “gold standard” (Tian, Cai, 
Thissen, & Xin, 2012). However, the computation burden of this Fisher information based SE 
(FISE) increases drastically as test length increases because the number of possible response 
patterns increases exponentially. Two computationally feasible alternatives are the empirical 
cross-product approach (XPD) and the supplemented expectation maximization (SEM) approach 
(Cai, 2008). Past research has demonstrated that XPD, though computationally most efficient, 
works well only when sample size is much larger than test length, otherwise it produces upward 
bias (Paek & Cai, 2014). On the other hand, the SEM approach is based on numerically 
differentiating an implicit function defined by EM iterations (a.k.a., EM map) and it generally 
performs well under a variety of different conditions. 


Item factor analysis (FA) which is rooted in categorical confirmatory factor analysis 


offers an alternative to IRT parameterization. It assumes that ordered-categorical item responses 
are discrete representations of underlying continuous latent responses. Different from FIML that 
is often used in IRT framework, weighted least squares (WLS) estimation is usually adopted 
within the FA framework. A challenge with WLS is that the size of the optimal weight matrix 
becomes exceedingly large and increases rapidly when test length increases, hence a statistically 
less efficient yet computationally more feasible alternative is the diagonally weighted least 
squares method (Satorra & Bentler, 1990; Wirth & Edwards, 2007). This reduction in efficiency 
leads to biased standard errors and hence the robust standard error is recommended. 

Previous research has discussed the advantages and limitations of both IRT and FA 
frameworks, along with their preferred estimation algorithms, i.e., FIML versus WLS! (Wirth & 
Edwards, 2007). Transformation formulas are available for practitioners to transform parameter 
estimates from one framework to the other (Kamata & Bauer, 2008; Wang, Kohli, & Henn, 
2016). However, when the parameters from FA model are transformed to the IRT 
parameterization, there is a lack of documentation on how to compute the SEs of the transformed 
parameters (i.e., convert the standard errors) accordingly, and this note tends to fill the gap. 
Forero and Maydeu-Olivares (2009) mentioned of using the delta method for conversion, but the 
details were not provided. We believe the conversion formulas will be useful in at least three 
scenarios: (1) when researchers choose to estimate the FA model via WLSMV (either because of 
their familiarity with the FA model or because WLSMV is much faster with high-dimensional 
models) but later want to report the IRT item parameter estimates, the SEs of the transformed 


parameter values can be obtained directly from the plug-in equations provided in this note; (2) 


! Other estimation methods are also available, such as Monte Carlo EM, Markov chain Monte Carlo, etc., but 
they are not the focus of this note. 


when researchers want to compare the performance of a sandwich estimator versus XPD/SEM, 
the transformation is needed to put SEs from the two models on the same metric; (3) when 
researchers want to compare the performance of WLSMV and MML with respect to the SEs of 
item parameter estimates, the transformation formula facilitates a direct comparison, whereas 
prior studies had to conduct hundreds of replications to obtain the empirical standard deviation 
(e.g., Finch, 2010; DeMars, 2012). 

To align with Kamata and Bauer’s (2008) argument, in this note, the conversion formulas 
are provided for four different FA parameterizations (marginal vs. conditional, and reference 
indicator vs. standardized factor) that cover the majority of the applications. The conversion 
formulas are also general enough to be used with unidimensional, multidimensional, and bi- 
factor models. The rest of the paper is organized as follows. We first briefly introduce the IRT 
model and factor analytic model along with the four parameterizations. These four 
parameterizations were extensively discussed in Kamata and Bauer (2008). Then we discuss the 
standard error transformation and provide the conversion transformation table. The simulation 
study is then given, followed by a real data example. The final conclusion is given in the end. 

IRT Models, FA Models, and Four Parameterizations 
In this section, we will briefly introduce the IRT models, factor analytic models, and the four 
different parameterizations (Kamata & Bauer, 2008). 
IRT and FA Models 
Starting with the simplest unidimensional IRT (UIRT) two-parameter model, the item 


response function is defined as 
Py = Py =14,,4,,,) = £(a,9, + 4,) (1) 


where y;; denotes the item response of person / to item i. In Equation (1), a; and d; are known as 
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the item slope (discrimination) and intercept (threshold) parameters respectively, whereas 6; 

denotes the latent trait of person /. f(.) is the cumulative distribution function (CDF), chosen to 
be either a normal ogive or logistic CDF. Here we use the intercept-slope notation to make the 
notation consistent with the other models. For instance, the multidimensional IRT (MIRT) is a 


natural extension of UIRT, with the item response function defined as 


Py = P(Y, =114,,4,,8,) = f(a; 8, +4,), @) 


where 0, =(6,,,4,,,K , Ox) denotes a column vector of K latent traits, and a; is a vector of 


K slope parameters. This notation is used throughout Reckase (2009) (Equation 4.5, p. 86). In 
a general MIRT model, an item can load on either one of the K dimensions (i.e., simple 
structure) or on multiple dimensions (i.e., complex structure), and all dimensions of @ are 
correlated. 

The bi-factor model originally proposed by Holzinger and Swineford (1937) and 
popularized by Gibbons and Hedeker (1992) represents a unique type of factor structure. In 
this structure, each item loads on one general factor and one specific factor, and all factors are 
independent (Reise, 2012). Hence, only two elements in a; are non-zero. 

For readers interested in the traditional discrimination-difficulty notation for the 


unidimensional two-parameter model, 


PQ, =114,,5,,0,) = (4, -8)), (3) 


or the simple structure MIRT model assuming item 7 measures the Ath latent trait, i.e., 


PY; =14,,5,,0,) = fay (A, — 8; ), (4) 
the conversion formula for computing the SE of b; (or b;;.) are provided as well. 


All different two-parameter IRT models can be reparameterized in the FA framework. 


Let y;; denote the continuous latent response variable governing the observed binary 


response y;;, then y;; is written as the additive linear form 


Y,=V, tA Ste tA, Kt E;. (5) 


iKOjK 

In Equation (5), v; is the intercept, usually fixed at zero for identification; ;; is the factor 
loading on the kth factor corresponding to item 7; €;, is the latent factor score for individual j on 
factor & and €;; is the residual for person j on item i. y;; is then dichotomized to form the binary 


observed y;; based on the following rule 


1 if y,>r, 


Za 0 if y<7, (6) 


where T; is the threshold for item 7. If a unidimensional model is considered, then K = 1. 
For bi-factor model, K equals the number of group factors plus one. 
Four Parameterizations 

The exposition in this section will closely mirror Kamata and Bauer (2008), readers 
familiar with this reference can skip this section. In item factor analysis, the scale of the latent 
response variable can be fixed by two different parameterizations. On one hand, the variance of 


* 


yi; is constrained to be 1 for all items such that the residual variance, V(¢;;), is estimated as 


V(EJ= 1 - Al cov(€)4A;. This unit variance constraint for yi; is rooted in the weighted least 
squares estimation method for binary FA, which involves the use of tetrachoric correlations 
(Kamata & Bauer, 2008). In fact, the tetrachoric correlation matrix is essentially a covariance 
matrix between underlying latent response variables with unit variance (Millsap & Yun-Tein, 
2004). Following the naming convention in Kamata and Bauer (2008), this parameterization 


fixes the marginal distribution for the continuous latent variable y;; and hence it is referred to as 


the marginal parameterization. On the other hand, if fixing the residual variance V(¢;;) to 1, then 
the marginal variance of y;, is computed as V(y;;)=1 + Al cov(€)A;. This parameterization is 
more in line with the convention in probit regression model, and it is referred to as the 
conditional parameterization. 

As both the marginal and conditional parameterizations fix the scale of y;;, to further 
identify the model, the scale of the latent factors also has to be fixed. Note that scaling y;; is only 
needed for FA, whereas scaling ¢ (or @ in IRT) is needed for both FA and IRT. Two widely used 
scaling conventions, in unidimensional scenario, are to standardize the common factor or to 
choose a reference indicator. When a multidimensional model is considered (bi-factor structure 
included), one must standardize all K factors where K refers to the total number of factors. With 
the choice of reference indicator, at least K items need to be selected as references whose 
parameters are fixed. In the confirmatory item factor analysis, when the item-factor loading 
structure is pre-determined, no further constraints are needed to remove rotational indeterminacy. 
Taken together, Table 1 summarizes four different parameterizations. In Mplus (Muthén & 
Muthén, 1998-2015), the marginal parameterization is notated as “DELTA” parameterization, 


and the conditional parameterization is notated as “THETA” parameterization. 


Table 1. Summary of four parameterizations. 


Reference Indicator Standardized Factor 
Marginal Aix= 1, Tix = 0 for k=1,...,K E(&) = 0, V(&,) = 1, for k=1,...,K 
(DELTA) Viy*)=1 Viy*) =1 
Conditional Aj;,= 1, t;, =0 for k=1,....K E(&,) = 0, V(&) = 1, for k=1,...,K 
(THETA) Ve) =1 V(e)=1 


Note. (1) y* and € have no subscript, and it refers to all (i, /)’s. 


The conversion between FA and IRT parameterization is well established in 


unidimensional model (e.g., Takane & de Leeuw, 1987), and multidimensional models 


(e.g., McDonald, 1999; Finch, 2010). Following the same derivations in Kamata and Bauer 
(2008), we arrive at the general conversion formulas for multidimensional models in Table 
2. When K = 1, the formulas are exactly the same as those in Table 2 of Kamata and Bauer 
(2008, p.144). In Table 2, F(&,) and V(é;,) denote the mean and variance of the Ath factor, 
€,. A; denotes the column vector of loading parameters of item i, and cov(€) denotes the 
covariance matrix of the factors. The last row in Table 2 refers to the discrimination- 
difficulty notation, in which the conversions for a-parameter stay the same as in the slope- 
intercept parameterization. The conversions for b-parameter are presented, and they are the 
same for both marginal and conditional parameterizations. 


Table 2. Conversion formulas for four factor analysis parameters 


Reference Indicator Standardized Factor 
Marginal AV (EJ i Ait 
ay = : _gi 
k A cov(é)A, ri A; cov(E)A, 
-[7,-a/£(é)| Ao af; 
dé. = : = 
Jeremy on 
Conditional a, =AV ( E, y a, =A, 
d; =-|7,-a/ E(é) | po ey 
Discrimination-Difficulty Notation eee E( é) 2 
(Equations 3 & 4) DS ie b =o 
AyV ( Si ) : 


Note. When logistic CDF is used, all above transformation formulas need to be multiplied by a constant 1.7. 


Standard Error Conversions 
In the FA framework, the limited information WLS is often used. Instead of using the 
raw response patterns as in MMLE/EM, WLS uses the first-order and second-order marginal 


proportions obtained from the response contingency tables to facilitate parameter estimation. The 
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primary idea is to find item parameter values such that they minimize the weighted deviations 
between the model-implied correlation matrix and the sample tetrachoric correlation matrix. 
Because WLS usually requires a large sample to precisely estimate a full, optimal weight matrix 
(Muthén, du Toit, & Spisic, 1997), researchers have suggested using only the diagonals of the 
weight matrix for estimation, leading to the so-called diagonally weighted (or modified) WLS 
estimators. Due to this “misspecification” of the weight matrix, the resulting standard errors are 
biased. As a remedy, the robust standard error via the Huber sandwich estimator is used to 
correct for specification error (Satorra & Bentler, 1990; Muthén & Muthén, 2015). With high- 
dimensional models, WLS is much faster than FIML (e.g., Wang, et al., 2016; Wang, Su, & 
Weiss, 2018). 

In this section, we provide the conversion formulas for SE transformation in Table 3. The 
multivariate delta method (Casella & Berger, 2002) is used to obtain the SE of the transformed 
FA parameter values when they are transformed to the IRT parameters in Table 2. For item i as 
an example,a; = g(7;,4;, E(€), cov(€)), and the specific form of the function g(:) is provided 
in Table 2. Then, given the error covariance matrix obtained via the sandwich estimator from a 
FA model, denoted as }';,, the standard error of a;, the transformed parameter, can be obtained 


via the multivariate delta method as follows 


SE(a;)= Le x ra XU a, - (7) 


In Equation (7), the superscript “T”’ denotes the matrix transpose, Ig, are the first derivative of 
a; = g(T;,4;, E(€), cov(€)) with respect to the model parameters, i.e., A;,T;, E(€), cov(€). The 
error covariance matrix }';4 is a O-by-O matrix output from the WLS estimation procedure, 
where Q stands for the total number of parameters in a model. A concrete example is given in 


Table 5 below. The same generic form in (7) applies to d; and b,, but their specific forms differ 
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by the four parameterizations. The generic forms are presented in Table 3 and specific forms of 
[,, and Ig, (Mp,) are presented in the Appendix. This multivariate delta method is implemented 
in R (R core team, 2016), and the script file is available in the online appendix for interested 


users. 


Ca, . : wd Sass . 
In Table 3, vel S| denote the vectorized elements of the first derivatives, which 


a(E(s)), 


is a K-by-1 vector with the sth element being i . Similarly, taking the first derivative of 


O(E(S)), 


a; with respect to the lower-triangular elements of factor covariance matrix cov(€) and stacking 


0a, 


them into a K x (K+1)/2—by-1 vector results in yec Cam 
d(cov(s)),, 


. Here the (sé) in the subscript 


denote the (s, ¢)th element of the covariance matrix. Please note that although the full size of Ig, 
(or 'g,) could be large, there will be a lot of 0’s in the vector. That is, for item 7, any derivatives 
with respect to A, and T;, where item h + item i are 0 by definition. For the full dimension of the 
T matrix, please refer to Table 5 for a few concrete examples. Otherwise, for the non-zero 


elements in the I’ matrix, please refer to Table 3. 
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Table 3. Generic forms of Iz, and ,, in Equation (7) 


Parameterization Pa, a, 
0a; L 6a,p [ od 
oT, OT, OT, 
M M 
oa L 0a. ad, 
Or, Or, Or, 
Oa; L Oax od, 
Reference Oh Oh, Oh 
indicator M M 
0a, L OA. od, 
OX OX ix 
Aa, Ba, ( éd, ) 
vec L vec vec} ——+—_ 
a(E(S)), o(E(S)), a(E(S)), 
d, 
vec a Ot L_ vec Oe ve : 
@(cov(é)),, a(cov(é)),, [ a(cov(S)),, 
Because T,, 241,..-, Axx are fixed as reference indicators, their relevant 
terms do not appear in the above matrix. In practice, the “reference 
Remarks indicators” do not have to be the first or first K items and hence “*” 
denotes the unspecified reference indicator items. For MIRT/bi-factor 
models, K loadings have to be fixed. 
[ 6a; . Ax 7 [ od, 
Ot, Ot, OT, 
M M M 
0a, L 0a x od 
Ot, Ot, Ot, 
Standardized 6a, L 4x od, 
factor OA, OA, 6h, 
M M M 
0a, L Oax Od, 
OAK Ox On«K 
vec | ——_ L vec | ——.._ vec) —— 
LL a(cov(é)), A(cov(E)),, J | |  a(cov(é)),, 
Because the factor means are fixed as constants, the derivatives with 
Raniaake respect to E (€) disappear. In addition, although we write cov(€) here, it 


is really the off-diagonal terms that matter because the variances are fixed 
as constants as well. 


Note: The generic form of Ip, is the same as that of Pg, by replacing d; with bj. 


A Simulation Study 


A small scale simulation study was conducted to evaluate if the conversion formulas in 


Table 3 actually work in terms of computing SEs of the transformed FA parameters when they 
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are transformed to the equivalent IRT parameterizations (based on Table 2). The models 
considered were unidimensional two parameter logistic model (2PL), multidimensional 2PL 
(both simple and complex structures) and bi-factor 2PL. Four parameterizations were included. 
Please note that we do not intend to compare the transformed SE to the SE obtained directly 
from the IRT parameterization because the comparison would be confounded by the type of 
estimators, i.e., sandwich estimator from WLSMV vs. SEM/XPD from FIML. Instead, we intend 
to show the SEs of the transformed parameter values (obtained from the multivariate delta 
method) are close to the empirical, “true” SEs from simulations. 
Design 

For all simulations, logistic CDF is used as the link function in IRT model, and sample 
size was fixed at 1,000. For the unidimensional 2PL model, test length was fixed at 15. For the 
between-item multidimensional model, there were 45 items with 15 items measuring each one of 
the three latent traits separately. For the within-item multidimensional model, there were 45 
items that measure multiple latent traits, and each latent trait was measured by 30 items. As an 
example, 0, was measured by items 1~5, 16~25, and 31~45. For the bi-factor model, there were 
45 items which measure one general latent trait and one of the three group latent traits. The 
specific simulation design is shown in Table 4, with details regarding the distributions from 
which the parameters were simulated. Two hundred replications were conducted per condition’. 
FA models were fitted using Mp/us°, parameters are estimated by WLS with means and variance 


adjusted (i.e.. WLSMV) along with the sandwich standard errors under all four parameterizations 


* We conducted a sensitivity analysis and found that the empirical standard deviation stabilized at around 150 
replications, hence 200 is a conservative choice. 


3 Mplus was chosen because it is a widely used structural equation modeling software package. Other SEM 
software packages could also be used. 
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(Finney & DiStefano, 2006; Flora & Curran, 2004). The parameter estimates were transformed 
using the conversion formula in Table 2, and the SEs were transformed using the conversion 
formula in Equation (7) and Table 3, with a, and Mg, taking the following dimensions for 
different models: 


Insert Table 4 Here 


Table 5. The dimension of the I matrix in this study. 
Model UIRT _ bi-factor Between-item MIRT Within-item MIRT 

Pa; 30x1 135x4 93 x3 138 x 3 

Pa, 30x1 135x1 93 x 1 138 x1 
Note: 30=15 (#of items in a test)x2 (#of parameters per item); 135 = 45 (# of items in a test)X3 (#of parameters 
per item); 93=45 (# of items in a test)x2 (#of parameters per item)+3 (correlations among factors); 138=30 (# 
of items measuring each factor)x3 (# of factors)+45 (# of intercept parameters)+3 (correlations among factors). 
The number of columns in the P' matrix is consistent with the dimension of the parameter vector, i.e., 1 in UIRT 
refers to 1 discrimination and 1 intercept parameters per item; 4 and 1 in the bi-factor model refer to 4 
discrimination (1 general factor and 3 group factors) and 1 intercept parameters per item; 3 and 1 in both 
between-item and within-item MIRT refer to 3 discrimination and 1 intercept parameters per item. 


Evaluation Criteria 
To compare the SE conversions under four parameterizations, average root mean square error 


(RMSE), average bias, and average relative bias were calculated. The ARMSE is defined by 


where G/’ is the transformed SE estimate from the delta method for item 7 in replication r, and 0; 
is the empirical standard deviation of transformed parameters across replications and it serves as 
the true value. Here, o is used to denote the standard error of a generic parameter, which could be 


discrimination or threshold parameters. / denotes the total number of items, and R denotes the 


i ee 
number of replications. Similarly, the average bias is computed by bias = Dapp (6/ - a) 
r i=1 


=1 
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and the average relative bias computed by RB = : x : x ing) 
R y ore Oo 


. For quality control, we 


r=l i 
also checked the parameter recovery from the FA parameterization (after transformation) as 
compared to the true parameter. The purpose is to check the behavior of the conversion formula 
in Table 2. The average RMSE, bias, and relative bias were also computed for the item 
parameter estimates. 
Results 

The results are summarized in Table 6 for unidimensional 2PL, in Table 7 for between- 
item multidimensional 2PL (M2PL), in Table 8 for within-item M2PL, and in Table 9 for bi- 
factor 2PL, respectively. In each table, per parameter, the first row is the parameter recovery and 
the second row refers to the standard error recovery. Note that for the reference indicator option, 
the reference items are excluded from computing the average bias and average RMSE. For the 
2PL and between-item M2PL, results for both the b-parameter (i.e., Equations 3, 4) and the d- 


parameter (i.e., Equations 1, 2) are included. 


Insert Tables 6 to 9 Here 


For unidimensional IRT model, the four different parameterizations in the FA framework 
generated almost identical parameter estimates, which was indicated by the small average bias, 
relative bias, and RMSE. The standard error estimates in the FA framework, after 
transformation, aligned well with the “true”, empirical standard error. There was no appreciable 
difference among the four parameterizations in terms of standard error recovery for a- and d- and 
b- parameters. Overall, all parameterizations yielded satisfactory standard error recovery. 

For the between-item MIRT model, all replications successfully converged but not all 


replications entered into the final results. In particular, for the marginal-standardized factor 
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parameterization, there were 17% replications with negative factor loadings, resulting in “NaN” 
for the transformed SEs (due to negative variance estimates after transformation). Even though 
these replications were excluded in our results, a closer inspection revealed that the negative 
factor loadings are due to the flipping (reversing) of the corresponding factor. For instance, when 
6, was reversed, all items loaded on 6, had negative loadings and 6, was also correlated 
negatively with 8, and 63. In this case, researchers could either manually multiply the 
parameters related to 8, by -1, or add non-negative constraints on factor loadings during 
estimation. Please note that the identifiability constraints in Table 1 do not preclude the 
possibility of factor flipping. Hence, in the marginal-standardized factor parameterization, 
researchers need to be aware that models could be equivalent up to factor reversing, and only if 
non-negative constraints are added is that the model strictly identified. 

Similarly, 18% replications from the marginal-reference parameterization, and 30.5% 
replications from the conditional-reference parameterization were excluded due to the observed 
“NaN” from the SEs of the transformed d-parameters. There is no clear interpretation why the 
variance of the transformed d-parameters became negative, and one possible reason is that the 
original sandwich SE estimates of those parameters were relatively high. The results in Table 7 
were therefore based on the remaining replications per parameterization. Again, overall, all four 
parameterizations resulted in acceptable parameter and standard error recovery with no 
noticeable differences, except the marginal-reference parameterization. In this case, the relative 
bias of standard error recovery is considerably higher. Even so, the actual bias is still acceptable. 
This is not surprising because marginal-reference combination requires the most complex 
transformation. Because the multivariate delta method relies on the first-order approximation of 


a Taylor series, it introduces some error by ignoring the higher-order terms. Please also note that 
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the transformation formula for the reference parameterization depends on the estimated mean 
and covariance of the latent factors. Hence any estimation error in the factor mean and 
covariance will be carried over in the transformed SEs. The SE recovery of the b-parameters was 
comparable to that of the d-parameters. 

For within-item multidimensional models, we ran 200 replications among which only 131 
replications converged properly from Mp/us. The non-converged cases only happened for the 
marginal-standardized factor parameterization, and when it occurred, the entire replication was 
eliminated although the other three parameterizations still yielded converged estimates. Out of 
these 131 replications, only 63 replications from the marginal-standardized factor 
parameterization enter final calculation because the other 68 replications produced negative 
correlation estimates among 6@’s, which again distorted the parameter estimates and their standard 
error estimates. The negative correlation again could be explained by possible reversing of some 
factors. For the other three parameterizations, all 131 replications were included in the final 
results. 

An elimination criterion was also used for bi-factor model results. Again, out of 200 
replications, only 175 converged properly, and non-convergence happened for the marginal and 
conditional reference indicator parameterizations. For these two parameterizations, the anomaly 
occurred when the factor mean estimates were extreme (i.e., >20 or <-20). Because the bi-factor 
model contains the larger number of parameters compared to the within-item or between-item 
MIRT models, it is unsurprising that fixing one item’s parameters may not be enough to fix the 
scale sometimes, resulting in extreme factor mean estimates. Among these converged 
replications, only 113 yielded proper parameter and standard error estimates for the marginal and 


conditional reference indicator parameterizations. The other 62 replications again yielded 
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extreme factor mean estimates (i.e., absolute values of factor means exceeding 10). For both the 


within-factor MIRT and bi-factor models, similar trends still continue to hold. That is, all four 


parameterizations seemed to work equally well, except a few cells, such as the az parameter in 


the marginal-standardized factor parameterization from the within-item MIRT model, and the ag 


parameter in the marginal-standardized factor parameterization from the bi-factor model. Again, 


even for these outstanding cells, the actual bias of SE is still acceptable. 

In sum, for all four different models considered in the study, the SEs from successful 
replications are all comparable across the four different parameterizations. However, given the 
large proportion of unsuccessful replications observed in some cases, we have the following 
recommendations: 

1. UIRT: Any of the four parameterizations is fine. 

2. Between-item/Within-item MIRT: The marginal-standardized parameterization may 
result in factor reversing, which is easy to spot and correct. The marginal-reference and 
conditional-reference parameterizations sometimes yield invalid SEs of d-parameters 
after transformation, and future studies need to be conducted to further explore the 
reasons. 

3. Bi-factor model: The marginal-reference and conditional-reference parameterizations 
may sometimes yield extreme factor means, which lead to either non-convergence or 


invalid transformed SEs. 


4. For multidimensional models in general, the conditional-standardized parameterization is 


recommended. 
A Real Data Example 


For illustration purposes, a unidimensional factor analysis model employing each of the four 
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parameterizations was fit to the National Educational Longitudinal Study (NELS) science test 
data. The test consists of 25 dichotomous items, and a sample size of 13,487. We randomly 
sampled 1,000 students for illustration because otherwise, the standard errors of the parameter 
estimates will all be smaller than .0001, and Mp/us default output of four decimal places could 
not capture the small values. The same data set was also input in flecMIRT (Cai, 2017) to obtain 
the IRT parameter estimates. 

Table 10 presents the FA model parameter estimates for the NELS science test data, 
along with the robust SEs. Unsurprisingly, both parameter estimates and their SEs are quite 
different across the four parameterizations. Then Table 11 presents the transformed values of the 
item parameter estimates, and the computed SEs of the transformed item parameter estimates. 
Consistent with our expectation, after transformation, the item parameter estimates are all quite 
similar and they are also close to the direct IRT parameter estimates. The SEs of the transformed 
parameter values also look similar and the differences mostly appear in the third decimal place. 
The SEs from the direct IRT model fitting are not directly comparable because they are obtained 
from different estimators (sandwich estimator vs. supplemental EM), but the values are still 


close. 


Discussion 
In this note, we provide a general conversion formula for transforming the standard errors 
of item parameter estimates from four different parameterizations in factor analysis framework to 
the corresponding IRT parameterization. The conversion formula is suitable for a broad family of 
models, such as unidimensional, correlated-factor, and bi-factor models. This note is motivated 


by the observation that there is a lack of documentation on computing the standard errors of 
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transformed parameter values yet there is a need, for instance, to compare the standard error 
estimates across different studies for meta-analytic purposes. 

While previous research that evaluated standard errors (Finch 2010, 2011) comparison 
between WLS and IRT estimations (such as EM) mainly used empirical standard errors by 
computing the standard deviation of parameter estimates across replications, the conversion 
formulas provided in this note offer to directly compute and compare SEs from different 
parameterizations. A simulation study is conducted to empirically evaluate the performance of 
the proposed conversion formula. Because the robust standard errors for WLS estimates has been 
shown to work well (Forero & Maydeu-Olivares, 2009) in the factor analytic framework, the SE 
of the transformed parameters also showed negligible bias. Even though we did not manipulate 
any factors (such as test length or sample size) in the simulation design, we considered four 
different IRT models which encompassed the majority of the applications. The R script crafted 
for this study is available to interested researchers who need to obtain the standard errors of item 
parameters from the IRT parameterization when the parameters are obtained in the FA models. 

Please note that we did not claim that the transformed SEs from FA models using 
WLSMV are directly equivalent to those obtained from IRT models via FIML. This is because 
the sandwich estimator of SEs differs fundamentally from the SEM/XPD methods (limited- 
information vs. full-information approaches). Instead, we intend to claim that the transformed 
SEs from WLSMV are appropriate for the parameter estimates that are transformed to the IRT 
metric. 

On a last note, although the SEs from the four different parameterizations in CFA 
framework can be transformed to the SEs of the item parameters from IRT metric, users need to 


exercise caution when using the SEs for Wald type of hypothesis testing for DIF and/or item drift 
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analysis. According to Gonzalez and Griffin (2001), in CFA, “alternative but equivalent ways to 
identify a model may yield different standard errors, and hence different Z test for a parameter”. 
This lack of identification invariance of SEs implies that a parameter’s SE, and hence its 
significance test, can be sensitive to arbitrary choice of identification (i.e., reference indicator vs. 
standardized factor). Although their conclusion was for the continuous factor analysis, it is likely 
that the same conclusion also generalizes to categorical CFA due to the nonlinear transformation 
reflected in Table 2. Future research should look into this issue more closely. It is especially 
important to check how the Type I error and power of the Wald-based DIF/item drift analysis 


methods may be affected by the different identification parameterizations of IRT models. 
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Table 4. Summary of Generation Models. 


UIRT Between-item MIRT Within-item MIRT Bi-factor 
Bsaee ai~ logN(sa=0, 7a=0.2) aia ~ logN(wta=0, oa=0.2) aia ~ logN(wa=0, oa=0.2) pet oe eae o 
di ~ UC-2, 2) ds UC-2, 2) Bete di~ U(-2, 2) 
Person O]f1 05 05 O|f1 05 05 

Parameter 6; ~ N(uo=0, o=1) [oJ~mrn|lo|jos 1 o5|) [oJ~mvnijoljos 1 os OjGEN ~ N(us=0, oe 

ollo5 05 1 0} 05 OS: 4 Ojcr ~ N(wo=0, oo=1) 
Test 
Structure 


Note. UIRT = Unidimensional IRT model. For the test structure of bi-factor models, Oo is the vector of general factor, 87 , 82 , and 


63 are the vectors of group factor 


Table 6. Parameter and standard error recovery of Unidimensional IRT Model. 


Marginal Conditional 
Standardized Reference Standardized Reference 
RMSE bias RB RMSE bias RB RMSE bias RB RMSE bias RB 


a Par, 1221 0056 0065 1217 .0048 .0060 8 .1221 0056 0065 1217 0048 = .0060 
S.E.  .0101 -.0034 -.0293 .0101 -.0036 -.0305 .0101 -.0035 -.0292 .0101 -.0036 = -.0305 
d Par. .0927 -.0061 0216 0925 -.0053 .0219 .0927 -.0061 .0216 .0925 -.0052 .0218 
S.E. .0088 .0010 0079 0074 -.0023 -.0252 .0072 -.0022 -.0240 .0079 0008 .0102 
b Par. .1634 0057 0258 1669 .0055 .0269 .1634 0058 .0258 .1668 0054 .0268 
S.E. .0331 —-.0029 = -.0093 0341 -.0030 -.0089 .0331 -.0029 -.0094 .0341 -.0029 -.0085 


Note. For each parameter, the first row is the parameter recovery and the second row refers to SE recovery. 


Table 7. Parameter and standard error recovery of between-item MIRT Model. 


Marginal Conditional 
Standardized Reference Standardized Reference 
RMSE bias RB RMSE bias RB RMSE bias RB RMSE bias RB 


al Par. .1214 0149 0160 1202 .0172 = .0185 1200 8 =.0151) = =.0162.— 1202S 0172 ~—s «0186 
S.E. .0156 .0014 0201 0465 .0422 = .3759 0113 =.0005 =.0084 =.0142 =-.0002 =.0073 
a2 Par. .1191 .0131 0171 1193 .0104 ~~ .0137 1191) =.0125)—Ss .0162)=—S 1192S 0103 — 0137 
S.E.  .0125 = -.0005 -.0015 0337 .0295 = .2580 0109 = -.0013 -.0064 .0127 -.0017 = -.0074 
a3 Par. .1234 .0108 0107 1213 .0151 = .0146 1237) =©.0109 =.0109) 1213S 0151 ~~ .0146 
S.E. .0153 =-.0014 + -.0066 0289 .0233 = .2003 0110 =-.0016 -.0112 .0116 -.0023 -.0154 
d Par. .0899 -.0046 .0379 .0887 -.0030 .0369 .0892 -.0036 =.0353 =.0887 = -.0030 = .0367 
S.E. .0087 .0046 0605 0232 .0077 ~~ .0907 .0080 = =-.0003 -.0018 .0187 -.0000 .0029 
b Par. .1513 .0050 0357 1483 8.0072 = .0342 1511 = .0038) §=.0339 =.1483 =.0071 ~=— .0341 
S.E. .0331 0004. .0292—— 0377 —-.0034 —-.0282 = 0319 = -.0021 =-.0125 .0371 -.0011  .0009 


Note. There are 17% abnormal replications from the marginal-standardization parameterization (i.e., negative loading parameters 
resulting in “NaN” for the transformed standard errors), 18% replications from the marginal-reference parameterization, and 30.5% 
from the conditional-reference parameterization. These replications were excluded from the summary. 


Table 8. Parameter and standard error recovery of Within-item IRT Model. 
Marginal Conditional 
Standardized Reference Standardized Reference 
RMSE _ bias RB RMSE _ bias RB RMSE bias RB RMSE ___bias RB 
ar Par. 1763 .0187 0202 1746 0000 -.0004 = .1745 -.0002 -.0006 .1746 -.0001 -.0005 
S.E.  .0219 -.0093 -.0471 .0186 -.0064 -.0358 .0281 0116 =©.0585 =.0189 = -.0085 = -.0470 
a2 Par. .3066 -.0573 -.0556 .1804 -.0102 -.0095 ~~ .1803 -.0104 -.0097 .1803 -.0104 -.0097 
S.E. .1396 -.1317 -.4373 .0209  -.0090 -.0465 .0298 0118 = .0627)— 0217S -.0115— -.0592 
az’ Par. = 1737) -.0228)-.0231) = 1729 -.0053--.0051 = .1729 -.0052. -.0051 .1729  -.0052 -.0051 
S.E. .0215  -.0060 -.0258 .0190 -.0042 -.0199 .0329 0165 .0950 .0194 -.0063  -.0313 
d Par. 1142 0088 .0026 1154 .0024 .0064 A137 0011 = =.0071 = =.1153 »=.0025~—S 0062 
S.E..0129. = .0005 ~=— 0108 ~=—.0145 ~=—.0035 ~—.0299 .0107 0002. -.0059_ ~=—.0110 ~=.0002_~—-.0063 
Note. 131/200 valid replications. Among the valid replications, 34% replications from the marginal-standardization parameterization 
failed to produce valid transformed SEs due to the negative estimated loadings. 


Table 9. Parameter and standard error recovery of bi-factor IRT Model. 


Marginal Conditional 
Standardized Reference Standardized Reference 
RMSE bias RB RMSE bias RB RMSE bias RB RMSE bias RB 
ao Par. .1322 0015 .0056 = .1317 .0054 .0066 1322 = .0045 0056 ~=.1316 .0054 .0066 
S.E. .0681 .0574 .4362 .0334 .0101 .0784 0117 + -.0009 -.0041 .0115 -.0007 -.0032 
ai Par. .1539 -.0017 -.0021 .1519  -.0019 -.0023 1539-1730) -.0021) 3=.1519 =-.0018 = -.0022 
S.E. .0179  -.0057 -.0304 .0237 .0000 .0098 .0179 =-.0056 =-.0303 .0189 -.0077 = -.0434 
a2 Par. 1418 .0062 .0074 ~~ .1430 .0057 .0066 1418  .0062 .0073 = .1431 .0057 .0066 
S.E. .0136 -.0033 -.0203 .0461 .0197 1343 .0136 = =-.0033. =-.0202 .0139 -.0017 ~~ -.0068 
a3. Par. .1444 .0039 .0049- ~=.1456 .0031 .0039 1444 .0039 .0049 ~=.1456 .0033 .0042 
S.E.  .0126 -.0016 -.0090 .0240 .0063 0453 0126 -.0016 -.0091 .0152 -.0018 ~ -.0071 
d Par. .1037 .0000 .0058 .1037 .0003 .0066 1037. ~=.0001 .0056 ~=.1036 .0004 .0070 
S.E. .0095 -.0006 -.0017 .0128 .0000 .0052 .0095 -.0006 -.0017 .0118 .0004 .0078 


Note. 175/200 valid replications. Among them, 43.5% replications from both the marginal-reference and conditional-reference 
parameterizations failed to produce valid transformed SEs due to the extreme factor mean estimates (i.e., absolute values exceeding 10) 


Table 10. Factor analysis model parameter estimates for NELS Science data. 


Marginal Conditional 
Item Standardized Reference Standardized Reference 
No. Factor Indicator Factor Indicator 
A T A T A T A T 
1 0.516 -0.516 1 0 0.602 -0.602 1 0 
(0.038) (0.042) (0.061) (0.05) 
2 0.47 -0.827 0.912 -0.357 0.533 -0.938 0.885 -0.405 
(0.042) (0.045) (0.104) (0.08) (0.06) (0.054) (0.132) (0.085) 
3 0.373 -0.479 0.723 -0.106 0.402 -0.516 0.668 -0.114 
(0.042) (0.041) (0.099) (0.075) (0.053) (0.045) (0.113) (0.079) 
4 0.399 -0.53 0.773 -0.132 0.435 -0.578 0.722 -0.144 
(0.041) (0.042) (0.096) (0.07) (0.054) (0.046) (0.113) (0.075) 
5 0.658 -0.824 1.275 -0.166 0.874 -1.094 1.451 -0.221 
(0.034) (0.045) (0.114) (0.092) (0.08) (0.068) (0.195) (0.118) 
6 0.587 -0.852 1.138 -0.265 0.726 -1.053 1.204 -0.328 
(0.037) (0.045) (0.108) (0.085) (0.071) (0.062) (0.163) (0.1) 
7 0.394 -0.49 0.763 -0.097 0.428 -0.533 0.711 -0.105 
(0.041) (0.041) (0.095) (0.071) (0.053) (0.046) (0.11) (0.076) 
8 0.423 -0.24 0.82 0.183 0.467 -0.265 0.775 0.202 
(0.04) (0.04) (0.095) (0.07) (0.053) (0.044) (0.114) (0.079) 
9 0.498 -0.479 0.966 0.019 0.575 -0.552 0.954 0.022 
(0.039) (0.041) (0.105) (0.081) (0.06) (0.049) (0.14) (0.094) 
10 0.544 -0.253 1.055 0.291 0.649 -0.302 1.077 0.347 
(0.037) (0.04) (0.107) (0.081) (0.063) (0.048) (0.152) (0.102) 
11 0.294 -0.048 0.569 0.246 0.307 -0.05 0.51 0.257 
(0.043) (0.04) (0.091) (0.065) (0.049) (0.041) (0.094) (0.07) 
12 0.572 -0.687 1.109 -0.115 0.698 -0.838 1.159 -0.14 
(0.037) (0.043) (0.107) (0.085) (0.067) (0.056) (0.158) (0.101) 
13 0.459 -0.732 0.889 -0.274 0.516 -0.824 0.857 -0.308 
(0.043) (0.044) (0.105) (0.078) (0.062) (0.052) (0.132) (0.083) 
14 0.679 -0.192 1.315 0.487 0.924 -0.261 1.534 0.663 
(0.033) (0.04) (0.114) (0.09) (0.083) (0.055) (0.203) (0.135) 
15 0.438 0.202 0.848 0.639 0.487 0.225 0.808 0.711 
(0.041) (0.04) (0.101) (0.074) (0.056) (0.045) (0.124) (0.092) 
16 0.325 -0.07 0.629 0.254 0.343 -0.074 0.569 0.269 
(0.042) (0.04) (0.096) (0.071) (0.05) (0.042) (0.103) (0.078) 
17 0.512 0.123 0.992 0.635 0.596 0.143 0.989 0.739 
(0.039) (0.04) (0.104) (0.077) (0.062) (0.046) (0.142) (0.101) 
18 0.496 -0.058 0.961 0.438 0.571 -0.066 0.948 0.504 
(0.039) (0.04) (0.102) (0.078) (0.059) (0.046) (0.136) (0.097) 
19 0.454 0.136 0.88 0.589 0.509 0.152 0.846 0.662 
(0.04) (0.04) (0.101) (0.075) (0.057) (0.045) (0.127) (0.094) 
20 0.282 0.148 0.546 0.43 0.294 0.155 0.488 0.449 
(0.043) (0.04) (0.091) (0.066) (0.049) (0.042) (0.092) (0.072) 
21 0.34 0.093 0.658 0.432 0.361 0.099 0.599 0.46 
(0.042) (0.04) (0.096) (0.071) (0.051) (0.042) (0.104) (0.08) 
22 0.219 0.285 0.424 0.503 0.224 0.292 0.372 0.515 
(0.045) (0.04) (0.092) (0.064) (0.049) (0.041) (0.087) (0.069) 
23 0.214 0.176 0.415 0.39 0.219 0.181 0.364 0.4 
(0.044) (00.04 (0.09) (0.062) (0.048) (0.041) (0.085) (0.067) 
24 0.527 0.356 1.022 0.883 0.621 0.419 1.03 1.039 
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(0.04) (0.041) (0.105) (0.077) (0.065) (0.049) (0.146) (0.108) 
25 0.297 0.729 0575 1025 031 0.763 0.515 41.074 
(0.051) (0.044) (0.108) (0.074) (0.058) (0.047) (0.111) (0.091) 


Note. Values in parentheses are standard errors. Under both reference indicator 
parameterizations, the first item was chosen as the anchor item, of which the intercept 
and slope were fixed as 0 and 1, respectively. The mean (wz) and variance (07) of 


factors under both standardized factor parameterizations were fixed as 0 and 1, 
respectively. Under the marginal-reference parameterization, the estimated value and 
standard error of uz were 0.516 and 0.042, and the estimated value and standard error 


of of were 0.266 and 0.04, respectively. And under the conditional-reference 
parameterization, the estimated value and standard error of “zg were 0.602 and 0.05, 
and the estimated value and standard error of of were 0.363 and 0.073, respectively. 


Table 11. Transformed IRT parameter estimates and direct estimates of IRT 
parameters and their corresponding standard errors for NELS Science data. 


Marginal Conditional Direct 
Item Standardized Reference Standardized Reference IRT 
No. Factor Indicator Factor Indicator 
a d a d a d a d a d 
1 1.024 1.024 1.023 1.024 1.023 1.023 1.024 1.023 1.04 1.02 
(0.086) (0.092) (0.108) (0.076) (0.12) (0.09) 
2 0.905 1.593 0.906 1.594 0.906 1.595 0.906 1.594 0.98 1.61 
(0.111) (0.096) (0.104) (0.097) (0.108) (0.093) (0.106) (0.088) (0.13) (0.11) 
3 0.683 0.878 0.683 0.878 0.683 0.877 0.684 0.877 0.68 0.85 
(0.095) (0.084) (0.100) (0.087) (0.093) (0.076) (0.092) (0.054) (0.10) (0.08) 
4 0.740 0.983 0.739 0.984 0.740 0.983 0.740 0.984 0.75 0.96 
(0.099) (0.085) (0.098) (0.080) (0.093) (0.076) (0.092) (0.080) (0.10) (0.08) 
5 1.485 1.860 1.484 1.859 1.486 1.860 1.486 1.861 1.70 2.00 
(0.126) (0.122) (0.149) (0.071) (0.132) (0.120) (0.12) (0.129) (0.19) (0.16) 
6 1.233 1.789 1.232 1.789 1.234 1.790 1.233 1.790 1.37 1.87 
(0.101) (0.107) (0.151) (0.149) (0.120) (0.108) (0.122) (0.065) (0.16) (0.14) 
7 0.729 0.906 0.728 0.907 0.728 0.906 0.728 0.906 0.74 0.89 
(0.098) (0.085) (0.097) (0.096) (0.093) (0.076) (0.086) (0.079) (0.10) (0.08) 
8 0.794 0.450 0.793 0.450 0.794 0.451 0.794 0.450 0.78 0.44 
(0.102) (0.085) (0.103) (0.081) (0.093) (0.076) (0.092) (0.054) (0.10) (0.08) 
9 0.976 0.939 0.977 0.940 0.978 0.938 0.977 0.939 0.98 0.93 
(0.117) (0.092) (0.110) (0.088) (0.108) (0.076) (0.104) (0.092) (0.11) (0.09) 
10 1.102 0.513 1.102 0.513 1.103 0.513 1.103 0.512 1.09 0.52 
(0.091) (0.091) (0.124) (0.088) (0.108) (0.076) (0.104) (0.046) (0.11) (0.08) 
11 0.523 0.085 0.522 0.085 0.522 0.085 0.522 0.085 0.49 0.08 
(0.087) (0.080) (0.078) (0.082) (0.076) (00.076 (0.078) (0.057) (0.08) (0.07) 
12 1.185 1.424 1.185 1.424 1.187 1.425 1.187 1.424 1.29 1.47 
(0.097) (0.100) (0.135) (0.065) (0.108) (0.093) (0.113) (0.085) (0.14) (0.11) 
13 0.878 1.401 0.877 1.402 0.877 1.401 0.878 1.401 0.91 1.40 
(0.108) (0.093) (0.101) (0.100) (0.108) (0.093) (0.101) (0.085) (0.11) (0.10) 
14 1.572 0.445 1.569 0.443 1.571 0.444 1.571 0.443 1.56 0.47 
(0.136) (0.105) (0.163) (0.074) (0.142) (0.093) (0.133) (0.082) (0.14) (0.10) 
15 0.828 -0.382 0.827 -0.381 0.828 -0.383 0.828 -0.382 0.76 -0.36 
(0.105) (0.085) (0.090) (0.099) (0.093) (0.076) (0.091) (0.083) (0.09) (0.08) 
16 0.584 0.126 0.583 0.127 0.583 0.126 0.583 0.125 0.57 0.12 
(0.090) (0.080) (0.087) (0.080) (0.076) (0.076) (0.081) (0.084) (0.09) (0.07) 
17 1.013 -0.243 1.012 -0.244 1.013 -0.243 1.013 -0.244 0.97 -0.23 
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(0.120) (0.089) (0.114) (0.063) (0.108) (0.076) (0.103) (0.090) (0.10) (0.08) 
18 0971 0114 0.970 0113 0971 0.112 0.971 0113 0.94 0.12 
(0.116) (0.088) (0.101) (0.043) (0.108) (0.076) (0.093) (0.081) (0.10) (0.08) 
19 0.866 -0.259 0.866 -0.257 0.865 -0.258 0.867 -0.260 0.83 -0.24 
(0.107) (0.086) (0.092) (0.097) (0.093) (0.076) (0.096) (0.089) (0.09) (0.08) 
20 0.500 -0.262 0.499 -0.263 0.500 -0.264 0.500 -0.264 0.50 -0.25 
(0.086) (0.079) (0.089) (0.074) (0.076) (0.076) (0.087) (0.076) (0.08) (0.07) 
21 0615 -0.168 0613 -0.167 0.614 -0168 0.614 -0.169 059 -0.16 
(0.091) (0.081) (0.089) (0.076) (0.093) (0.076) (0.090) (0.055) (0.09) (0.07) 
22 0.382 «0.497. «Ss «0.381—Ss«~-0.495 = «0.381. -0.496~—s0.381S -0.495—:0.37_—s--0.47 
(0.082) (0.078) (0.083) (0.067) (0.076) (0.076) (0.087) (0.064) (0.08) (0.07) 
23. -0.372,—Ss -0.306~=—-0.372,—Ss -0.306 ~=-0.372,—s«-0.308 ~=—:0.373.—Ss -0.307. Ss 0.35~—-0..29 
(0.082) (0.078) (0.082) (0.067) (0.076) (0.076) (0.081) (0.076) (0.08) (0.07) 
24 «1.054 0.712, «11.054 0.711 =—«'11.056 = -0.712. «11.055 -0.712 0.99 -0.68 
(0.124) (0.092) (0.118) (0.076) (0.108) (0.076) (0.108) (0.094) (0.10) (0.08) 
25 0.529 1.298 §= 0.528 = 1.296 = 0.527, -1.297. 0.527. -1.299 0.53—-1.26 
(0.107) (0.083) (0.101) (0.106) (0.093) (0.076) (0.096) (0.067) (0.09) (0.08) 


Note. Values in parentheses are standard errors. The direct IRT model fitting was conducted 
using flexMIRT. 
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Appendix A: l’,, and Fy, in Table 3 for different parameterizations 


Al. Marginal-Reference Indicator 


The analytic forms for the first derivatives are presented below, the non-presented terms, such as 


} is a O-vector. 


ec il 
ce (¢)) 


[1-47 cov(eya, +N Ay V(E) 
Oa, (1-4; cov(&)a,)*” 
on, AV (S i Ni, 
(1-27 cov(é)a,)” 


AV (E)” Agd is” “it 
a, _ 2x cow )a,™ 


2x(1-27 cov(&)a,)*” 


ad, _F@yI- 27 cov(é)a, -[1- 17 cov(e)a, | N,(z, -¥ E()) 


On, 1-2) cov(E)A, 
[1-27 cov(é)a, ] 


Si = [1-47 cove ya, |” 
Od, = -| ¢, —ATE(E) Agr 
A(cov(€)),, 2x(1-a7 cov()a,)”” 


ad. 2. 


1S 


O(E(¢)), - 1-A! cov(€)A, 


A2. Conditional-Reference Indicator 


s=t#kors#t 


s=t=k 


The analytic forms for the first derivatives are presented as follows. The non-presented terms, such 
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as vec} ——+— |, vec} ———— |], and vec| ———+— | are a 0-vector. 
O(E(S)) O(E(S)), O(cov()) 


Ody, y2 
pueden 4 
a 
Oa, =i -12 
A(cov(é))y 2" (Ss) 
ad. 
L =F 
a, (¢) 
Ody 2 5 
OT, 


A3. Marginal-Standardized Factor 
The analytic forms for the first derivatives are presented as follows. In all the following equations, 


cov(§) =cor(§) because the factors are standardized. 


[1 —h; cov(S)a; + Ny Ary ] 


=k 
a, | (1-27 cov(é)2,)°” : 
on,, Ai N, is eee 
(127 coven, 

Ody, = Ai A, A, 
O(cov(E)),, 2x (1-2; cow(&)a,)”” 
ad, -,[1-a7 cove), [' N, = -aN, 
On, 1-4; cov()A, fe a” cov(e), ir 
ad, a 

‘=—|1-2! cov(E)a, 

Bg Lt ov] 

Cd, —t.A.d. 


U ts* “it 


O(cov(é)),, ~ 2x (1-47 cov(&)a,)*” 
A4. The analytic forms for I’,, in Table 3 
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Reference Indicator 


i 


Ob; = -E(é, in se [7 = A; E(&)] 


for 2PL. 
On, AV (EY 


d thi tion simplifies to a 
, an 1s equation simplTl —————— 
: : a, VO” 


at 
Ot. AVE 


Aco), WE” | A 


ik 


ab, 1 e -AlE(é) 


} and the first derivative of b; with respect to other 


elements in cov(&) are all 0’s. 


Ob; = -A., 
O(E(E)), AVG)? 


Standardized Factor 


ab, 


ie 


os and —+ : 
Ot, Ay ON, Ai 


. The first derivative of b; with respect to all other parameters are 0. 
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Appendix B: Mplus syntax for bi-factor models 


B1. Marginal standardized factor parameterization 


TITLE: binary bi-factor analysis model BY MARGINAL STANDARDIZED FACTOR 
DATA: FILE IS bif.dat; 

VARIABLE: NAMES ARE ul-u45; 

CATEGORICAL ARE ul1-u45; 

ANALYSIS: MODEL = NOCOVARIANCES; 

TYPE = GENERAL; 

ESTIMATOR = WLSMV; 

PARAMETERIZATION = DELTA; 


! DELTA = marginal parameterization; THETA = conditional parameterization. 


MODEL: skill0 BY u1* u2-u45; 
skill1 BY ul* u2-u15; 

skill2 BY u16* u17-u30; 

skill3 BY u31* u32-u45; 
[skill1@0]; 

skill1@1; 

[skill2@0]; 

skill2@1; 

[skill3@0]; 

skill3@1; 

[skill0@0]; 

skill0@1; 

OUTPUT: TECH1, TECH3, TECH4; 


B2. Marginal reference indicator parameterization 


TITLE: binary bi-factor analysis model BY MARGINAL REFERENCE INDICATOR 
DATA: FILE IS bif.dat; 

VARIABLE: NAMES ARE ul-u45; 

CATEGORICAL ARE ul-u45; 

ANALYSIS: MODEL = NOCOVARIANCES; 

TYPE = GENERAL; 

ESTIMATOR = WLSMV; 

PARAMETERIZATION = DELTA; 


! DELTA = marginal parameterization; THETA = conditional parameterization. 


MODEL: skill0 BY u1* u2@1 u3-u45; 
skilll BY ul u2-u15; 

skill2 BY u16 u17-u30; 

skill3 BY u31 u32-u45; 

[u1$1@0]; 
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[u2$1@0]; 

[u16$1@0)]; 

[u31$1@0)]; 

[skillO*]; 

[skill1*]; 

[skill2*]; 

[skill3*]; 

OUTPUT: TECH1, TECH3, TECH4; 
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